1 Introduction

This is my final for ESS 580A7, Introduction to Environmental Data Science. Each chapter is an assignment from each week of the course.

The beginning of each assignment contains code written by Matthew Ross and Nathan Mueller.

2 Assignment 1: RMarkdown Examples

title: “Discharge Data Example”

date: “Last compiled on March 14, 2022”

output: html_document: toc: true toc_float: true toc_depth: 3

2.1 Methods

The Poudre River at Lincoln Bridge is:

  • Downstream of only a little bit of urban stormwater

  • Near Odell Brewing CO

  • Near an open space area and the Poudre River Trail

  • Downstream of many agricultural diversions

2.1.1 Site Description

2.2 Data Acquisition and Plotting Tests

2.2.1 Data Download

#Download the Data
q <- readNWISdv(siteNumbers = '06752260',
                parameterCd = '00060',
                startDate = '2017-01-01',
                endDate = '2022-01-01') %>%
  rename(q = 'X_00060_00003')

2.2.2 Static Data Plotter

#Plot the Data with ggplot
ggplot(q, aes(x = Date, y = q)) + 
  geom_line() + 
  ylab('Q (cfs)') + 
  ggtitle('Discharge in the Poudre River, Fort Collins')

2.2.3 Interactive Data Plotter

#Plot A Dygraph
q_xts <- xts(q$q, order.by = q$Date)


dygraph(q_xts) %>%
  dyAxis("y", label = "Discharge (cfs)") 

2.3 Assignment

This assignment will be primarily about demonstrating some expertice in using RMarkdown, since we will be using Rmds as the primary form of homework and assignments. With that in mind, your assignment for this homework is to:

2.3.1 Question 1-3:

1: Fork the example repository into your personal GitHub (Completed)

2: Create an RStudio project from your Personal clone of the Repo (Completed)

3: Create a table of contents that is floating, but displays three levels of headers instead of two (by editing the content at the beginning of the document) (Completed)

2.3.2 Question 4:

Make a version of the dygraph with points and lines by using rstudio’s dygraph guide

#Plot A Dygraph
dygraph(q_xts) %>% 
  dyOptions(drawPoints = TRUE, pointSize = 2) %>%
  dyAxis("y", label = "Discharge (cfs)") 

2.3.3 Question 5:

Writing a paragraph on the Poudre river with at least three hyperlinks, two bolded sections, and one italicized phrase. The content of this paragraph is not vital, but try to at least make it true and interesting, and, of course, don’t plagiarize.

The Poudre Itself

The Cache la Poudre River, also known as The Poudre, is a river beloved by Fort Collins residents. Beginning in Rocky Mountain National Park, and ending when it merges with the South Platte River near Greeley, CO, the Poudre spans 76 miles. From start to finish, the Poudre has about a 7,000 feet elevation change.

The Poudre & Recreation

The Poudre river provides Northern Colorado with lots of different types of recreation, including whitewater rafting, fishing, and floating within it. For whitewater rafting, the Poudre brings multiple options for different ability levels. From easy Class I rapids to a Class V waterfall rapid, you can pick and choose from multiple different trips to tailor the experience to your comfort and ability. For more information or to book a trip, check out this rafting site. If white water rafting isn’t your thing, you may enjoy fly fishing. The poudre has a ton of fish, and is well known for its trout-rich waters. So no matter where along the river you decide to stop and drop a line, you’ll find fish, but this guide has a thorough map with locations, and information about rules and regulations. If being in the water doesn’t float your boat, there are many ways to enjoy the Poudre while staying dry, such as hiking, hammocking, biking, and walking along it. The Poudre has carved a beautiful canyon overtime, with many “gulches”. These gulches are particularly beautiful and make for great hikes. Once you’ve done a hike and you need a break, the Poudre also has numerous picnic spots along it that make for great hammocking. Whether you embark on a great adventure along the Poudre, or just head up the canyon for a relaxing drive, you’ll have a good day.

2.3.4 Question 6-9:

6: Knit that document, and then git commit and push to your personal GitHub. (Completed)

7: Use the GitHub -> Settings -> Pages tab to create a website of your report. (Completed)

8: Bonus, make the timestamp in the header dynamic. As in it only adds todays date, not just a static date you enter. (Completed)

9: Bonus, create an “index_talk.Rmd” version of your document using the revealjs package. Add link to your original report-style document. (Completed)

3 Assignment 2: Fire Data Wrangle

title: “Hayman Fire Recovery” author: “Samantha Clark” date: “10/3/2019” output: html_document: default pdf_document: default

# Prepare Libraries Needed
library(tidyverse)
library(tidyr)
library(ggthemes)
library(lubridate)
library(knitr)

# Now that we have learned how to munge (manipulate) data
# and plot it, we will work on using these skills in new ways

knitr::opts_knit$set(root.dir='.')
####-----Reading in Data and Stacking it ----- ####
#Reading in files
files <- list.files('./data',full.names=T)


#Read in individual data files
ndmi <- read_csv(files[1]) %>% 
  rename(burned=2,unburned=3) %>%
  mutate(data='ndmi')


ndsi <- read_csv(files[2]) %>% 
  rename(burned=2,unburned=3) %>%
  mutate(data='ndsi')

ndvi <- read_csv(files[3])%>% 
  rename(burned=2,unburned=3) %>%
  mutate(data='ndvi')

# Stack as a tidy dataset
full_long <- rbind(ndvi,ndmi,ndsi) %>%
  gather(key='site',value='value',-DateTime,-data) %>%
  filter(!is.na(value))

3.1 Question 1

What is the correlation between NDVI and NDMI? - here I want you to convert the full_long dataset in to a wide dataset using the function “spread” and then make a plot that shows the correlation as a function of if the site was burned or not (x axis should be ndmi) You should exclude winter months and focus on summer months

# Convert data from long format to wide format
full_wide <- pivot_wider(full_long, names_from = "data", values_from = "value")

# Select months and plot NDVI vs NDMI
full_wide %>%
  mutate(month=month(DateTime), year=year(DateTime)) %>%
  filter(!month %in% c(9,10,11,12,1,2,3,4,5)) %>%
  ggplot(aes(x=ndmi, y=ndvi,color=site)) +geom_point() +geom_smooth() + ggtitle('Figure 1: NDVI vs NDMI') +xlab('NDMI') + ylab('NDVI')
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 11 rows containing non-finite values (stat_smooth).
## Warning: Removed 11 rows containing missing values (geom_point).

Overall NDMI and NDVI are positively correlated, so as moisture increases, so does vegetation. However, at a certain point, when moisture becomes too high, vegetation no longer increases and either plateaus or decreases. This is understandable for instances of snow, or if vegetation has a very specific range of moisture in which it thrives. If the NDMI is above the vegetation’s upper moisture limit, the vegetation may begin to suffer, and therefore NDVI would decrease. There is also a difference between the trend for the unburned site and the burned site. Overall, vegetation is lower for the burned site no matter the moisture level. This makes sense as the burned site has fire damage, which will impact the vegetation for years after the fire. Overall, both of the trends are positively correlated to about the same point.

3.2 Question 2

What is the correlation between average NDSI (normalized snow index) for January - April and average NDVI for June-August? In other words, does the previous year’s snow cover influence vegetation growth for the following summer?

# Convert NDVI dataset
ndvi_long <- pivot_longer(ndvi, cols = c(burned, unburned), names_to = "site", values_to = "value") %>%
  filter(!is.na(value)) %>%
  transform(value = as.numeric(value))

#find averages for each year, for June- August
NDVI_avgs <- ndvi_long %>%
  mutate(month=month(DateTime), year=year(DateTime)) %>%
  filter(month %in% c(6,7,8)) %>%
  group_by(year, data) %>%
  summarize(mean = mean(value))
## `summarise()` has grouped output by 'year'. You can override using the `.groups` argument.
# Convert NDSI dataset
ndsi_long <- pivot_longer(ndsi, cols = c(burned, unburned), names_to = "site", values_to = "value") %>%
  filter(!is.na(value)) %>%
  transform(value = as.numeric(value))

#find averages for each year, for January-April
NDSI_avgs <- ndsi_long %>%
  mutate(month=month(DateTime), year=year(DateTime)) %>%
  filter(month %in% c(1,2,3,4)) %>%
  group_by(year, data) %>%
  summarize(mean = mean(value))
## `summarise()` has grouped output by 'year'. You can override using the `.groups` argument.
#merge datasets
finaldf = rbind(NDSI_avgs, NDVI_avgs)

#plot
ggplot(finaldf) +aes(x=year, y=mean, color=data) +geom_point() +geom_line() + ggtitle('Figure 2: Snow Effect on Vegetation') +xlab('Year') +ylab('Mean')

It appears that there is not a trend between snow fall in January-April and vegetation in June-August. The range in snow index does not have a large effect on vegetation index. For example, the large drop in average snow index in 1988 does not have an impact on the vegetation. This is an interesting lack of relationship, as I would expect the amount of snow to have a large impact on the greenness for the upcoming season.

3.3 Question 3

How is the snow effect from question 2 different between pre- and post-burn and burned and unburned?

#find averages for each year, for January-April
NDSI_avgs2 <- ndsi_long %>%
  mutate(month=month(DateTime), year=year(DateTime)) %>%
  filter(month %in% c(1,2,3,4)) %>%
  group_by(year, data, site) %>%
  summarize(mean = mean(value))
## `summarise()` has grouped output by 'year', 'data'. You can override using the `.groups` argument.
#find averages for each year, for June- August
NDVI_avgs2 <- ndvi_long %>%
  mutate(month=month(DateTime), year=year(DateTime)) %>%
  filter(month %in% c(6,7,8)) %>%
  group_by(year, data, site) %>%
  summarize(mean = mean(value))
## `summarise()` has grouped output by 'year', 'data'. You can override using the `.groups` argument.
#merge datasets
finaldf2 = rbind(NDSI_avgs2, NDVI_avgs2)

#plot
ggplot(finaldf2) +aes(x=year, y=mean, color=data, shape = site) +geom_point() +geom_line() + ggtitle('Figure 3: Snow Effect on Vegetation At Two Sites Before and After 2003 Fire') +xlab('Year') +ylab('Mean')

The fire occurred in 2003, and before the fire there is not a large difference between the NDVI for the (future) burned vs unburned areas. After the fire, there is a large difference between NDVI for burned vs unburned, which is to be expected as fires can impact vegetation for decades after the fire actually happens. Post-burn, it appears that NDSI has a larger impact on NDVI than pre-burn. This would make sense, as vegetation is likely more succeptible to its surroundings and less able to adapt.

3.4 Question 4

What month is the greenest month on average?

#create month column
monthndvi <- ndvi_long %>%
  mutate(month=month(DateTime))

#find means by month
monthlymeans_NDVI <- monthndvi %>%
  group_by(month) %>%
  summarize(mean = mean(value))

#create a table
kable(monthlymeans_NDVI, caption = 'NDVI Averages by Month')
(#tab:ques4 assign2)NDVI Averages by Month
month mean
1 0.2186012
2 0.1986539
3 0.2019000
4 0.2501716
5 0.3036180
6 0.3512811
7 0.3633104
8 0.3871568
9 0.3827147
10 0.3570475
11 0.2852874
12 0.2513979

NDVI is the normalized difference vegetation index, and determines the density of green on land. As such, I can find the average NDVI of different months and determine which month, on average, is the greenest. The averages range from 0.1986 (February) to 0.3871 (August). In other words, August was the greenest month on average. The second greenest was September at 0.3827. The least greenest month was February at 0.1986.

3.5 Question 5

What month is the snowiest on average?

#create a month column
monthndsi <- ndsi_long %>%
  mutate(month=month(DateTime))

#find means by month
monthlymeans_NDSI <- monthndsi %>%
  group_by(month) %>%
  summarize(mean = mean(value))

#create a table
kable(monthlymeans_NDSI, caption = 'NDSI Averages by Month')
(#tab:ques5 assign2)NDSI Averages by Month
month mean
1 0.2099584
2 0.1988183
3 -0.0420229
4 -0.2278234
5 -0.4105171
6 -0.4409882
7 -0.4504829
8 -0.4594660
9 -0.4539411
10 -0.4085219
11 -0.1151240
12 0.1241000

NDSI is the normalized difference snow index, and detects the presence of snow on land. As such, I can find the average NDSI of different months and determine which month, on average, is the snowiest The averages range from -0.4594 (August) to 0.2099 (January). In other words, January was the snowiest month on average. The second snowiest was February at 0.1988. The least snowiest month was August at -0.4594.

3.6 Bonus Question: Redo all problems with spread and gather using modern tidyverse syntax.(Completed)

4 Assignment 3: Snow Functions Iteration

title: “Snow Data Assignment: Web Scraping, Functions, and Iteration” author: “Samantha Clark” date: “2-7-2022” output: html_document

4.1 Simple web scraping

R can read html using either rvest, xml, or xml2 packages. Here we are going to navigate to the Center for Snow and Avalance Studies Website and read a table in. This table contains links to data we want to programatically download for three sites. We don’t know much about these sites, but they contain incredibly rich snow, temperature, and precip data.

4.1.1 Reading an html

4.1.2 Data Download

4.1.2.1 Download data in a for loop

#Grab only the name of the file by splitting out on forward slashes
splits <- str_split_fixed(links,'/',8)

#Keep only the 8th column
dataset <- splits[,8] 

#generate a file list for where the data goes
file_names <- paste0('data/',dataset)

for(i in 1:3){
  download.file(links[i],destfile=file_names[i])
}

downloaded <- file.exists(file_names)

evaluate <- !all(downloaded)

4.1.2.2 Download data in a map

#Map version of the same for loop (downloading 3 files)
if(evaluate == T){
  map2(links[1:3],file_names[1:3],download.file)
}else{print('data already downloaded')}
## [[1]]
## [1] 0
## 
## [[2]]
## [1] 0
## 
## [[3]]
## [1] 0

4.1.3 Data read-in

4.1.3.1 Read in just the snow data as a loop

#Pattern matching to only keep certain files
snow_files <- file_names %>%
  .[!grepl('SG_24',.)] %>%
  .[!grepl('PTSP',.)]

#empty_data <- list()

# snow_data <- for(i in 1:length(snow_files)){
#   empty_data[[i]] <- read_csv(snow_files[i]) %>%
#     select(Year,DOY,Sno_Height_M)
# }

#snow_data_full <- do.call('rbind',empty_data)

#summary(snow_data_full)

4.1.3.2 Read in the data as a map function

#Create a map function
our_snow_reader <- function(file){
  name = str_split_fixed(file,'/',2)[,2] %>%
    gsub('_24hr.csv','',.)
  df <- read_csv(file) %>%
    select(Year,DOY,Sno_Height_M) %>%
    mutate(site = name)
}
#read in the data with function
snow_data_full <- map_dfr(snow_files,our_snow_reader)
## Rows: 6211 Columns: 52
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (52): ArrayID, Year, DOY, Hour, LoAir_Min_C, LoAir_Min_Time, LoAir_Max_C...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 6575 Columns: 48
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (48): ArrayID, Year, DOY, Hour, LoAir_Min_C, LoAir_Min_Time, LoAir_Max_C...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
summary(snow_data_full)
##       Year           DOY         Sno_Height_M        site          
##  Min.   :2003   Min.   :  1.0   Min.   :-3.523   Length:12786      
##  1st Qu.:2008   1st Qu.: 92.0   1st Qu.: 0.350   Class :character  
##  Median :2012   Median :183.0   Median : 0.978   Mode  :character  
##  Mean   :2012   Mean   :183.1   Mean   : 0.981                     
##  3rd Qu.:2016   3rd Qu.:274.0   3rd Qu.: 1.520                     
##  Max.   :2021   Max.   :366.0   Max.   : 2.905                     
##                                 NA's   :4554

4.1.3.3 Plot snow data

#create a new data set from full data (group data and get means)
snow_yearly <- snow_data_full %>%
  group_by(Year,site) %>%
  summarize(mean_height = mean(Sno_Height_M,na.rm=T))
## `summarise()` has grouped output by 'Year'. You can override using the `.groups` argument.
#plot the new data set
ggplot(snow_yearly,aes(x=Year,y=mean_height,color=site)) + 
  geom_point() +
  ggthemes::theme_few() + 
  ggthemes::scale_color_few()
## Warning: Removed 2 rows containing missing values (geom_point).

4.2 Assignment:

  1. Extract the meteorological data URLs. Here we want you to use the rvest package to get the URLs for the SASP forcing and SBSP_forcing meteorological datasets.
library(rvest)

site_url1 <- 'https://snowstudies.org/archived-data/'

#Read the web url
webpage1 <- read_html(site_url1)

#extract weblinks and urls
hwlinks <- webpage1 %>%
  html_nodes('a') %>%
  .[grepl('forcing',.)] %>%
  html_attr('href')
hwlinks
## [1] "https://snowstudies.org/wp-content/uploads/2022/02/SBB_SASP_Forcing_Data.txt"
## [2] "https://snowstudies.org/wp-content/uploads/2022/02/SBB_SBSP_Forcing_Data.txt"
  1. Download the meteorological data. Use the download_file and str_split_fixed commands to download the data and save it in your data folder. You can use a for loop or a map function.
#Download data

#Grab only the name of the file by splitting out on forward slashes
hwsplit <- str_split_fixed(hwlinks,'/',8)

#Keep only the 7th column
hwdataset <- hwsplit[,8] 

#generate a file list for where the data goes
hwfilenames <- paste0('data/',hwdataset)

for(i in 1:2){
  download.file(hwlinks[i],destfile=hwfilenames[i])
}

hwdownloaded <- file.exists(hwfilenames)

hwevaluate <- !all(hwdownloaded)
  1. Write a custom function to read in the data and append a site column to the data.
# Grab the variable names from the pdf
library(pdftools)
## Using poppler version 20.12.1
headers <- pdf_text('https://snowstudies.org/wp-content/uploads/2022/02/Serially-Complete-Metadata-text08.pdf') %>%
  readr::read_lines(.) %>%
  trimws(.) %>%
  str_split_fixed(.,'\\.',2) %>%
  .[,2] %>%
  .[1:26] %>%
  str_trim(side = "left") 

#Read in data
forcing_reader <- function(hwfilenames){
  hwdf <- read_fwf(hwfilenames)
  names(hwdf) = headers[1:26]
  hdr1 = str_split_fixed(hwfilenames,'_', 3)[, 2]
  mutate(hwdf, site=hdr1)

}
  1. Use the map function to read in both meteorological files. Display a summary of your tibble.
#use function to read in files
forcing_data_full <- map_dfr(hwfilenames,forcing_reader)
## Rows: 69168 Columns: 19
## ── Column specification ────────────────────────────────────────────────────────
## 
## chr  (2): X12, X14
## dbl (17): X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X13, X15, X16, X17, ...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: The `value` argument of `names<-` must have the same length as `x` as of tibble 3.0.0.
## `names` must have length 19, not 26.
## Rows: 69168 Columns: 19
## ── Column specification ────────────────────────────────────────────────────────
## 
## chr  (2): X12, X14
## dbl (17): X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X13, X15, X16, X17, ...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: The `value` argument of `names<-` must have the same length as `x` as of tibble 3.0.0.
## `names` must have length 19, not 26.
#show summary tibble
summary(forcing_data_full)
##       year          month             day             hour           minute 
##  Min.   :2003   Min.   : 1.000   Min.   : 1.00   Min.   : 0.00   Min.   :0  
##  1st Qu.:2005   1st Qu.: 3.000   1st Qu.: 8.00   1st Qu.: 5.75   1st Qu.:0  
##  Median :2007   Median : 6.000   Median :16.00   Median :11.50   Median :0  
##  Mean   :2007   Mean   : 6.472   Mean   :15.76   Mean   :11.50   Mean   :0  
##  3rd Qu.:2009   3rd Qu.: 9.000   3rd Qu.:23.00   3rd Qu.:17.25   3rd Qu.:0  
##  Max.   :2011   Max.   :12.000   Max.   :31.00   Max.   :23.00   Max.   :0  
##      second  precip [kg m-2 s-1] sw down [W m-2]     lw down [W m-2]  
##  Min.   :0   Min.   :0.000e+00   Min.   :-9999.000   Min.   :-9999.0  
##  1st Qu.:0   1st Qu.:0.000e+00   1st Qu.:   -3.510   1st Qu.:  173.4  
##  Median :0   Median :0.000e+00   Median :   -0.344   Median :  231.4  
##  Mean   :0   Mean   :3.838e-05   Mean   :-1351.008   Mean   :-1325.7  
##  3rd Qu.:0   3rd Qu.:0.000e+00   3rd Qu.:  294.900   3rd Qu.:  272.2  
##  Max.   :0   Max.   :6.111e-03   Max.   : 1341.000   Max.   :  365.8  
##   air temp [K]   windspeed [m s-1]   relative humidity [%] pressure [Pa]  
##  Min.   :242.1   Min.   :-9999.000   Length:138336         Min.   :63931  
##  1st Qu.:265.8   1st Qu.:    0.852   Class :character      1st Qu.:63931  
##  Median :272.6   Median :    1.548   Mode  :character      Median :65397  
##  Mean   :272.6   Mean   : -790.054                         Mean   :65397  
##  3rd Qu.:279.7   3rd Qu.:    3.087                         3rd Qu.:66863  
##  Max.   :295.8   Max.   :  317.300                         Max.   :66863  
##  specific humidity [g g-1] calculated dewpoint temperature [K]
##  Length:138336             Min.   :   0.0                     
##  Class :character          1st Qu.:   0.0                     
##  Mode  :character          Median :   0.0                     
##                            Mean   : 387.3                     
##                            3rd Qu.:   0.0                     
##                            Max.   :2009.0                     
##  precip, WMO-corrected [kg m-2 s-1]
##  Min.   :   0.0                    
##  1st Qu.:   0.0                    
##  Median :   0.0                    
##  Mean   : 466.2                    
##  3rd Qu.:   0.0                    
##  Max.   :3002.0                    
##  air temp, corrected with Kent et al. (1993) [K]
##  Min.   :   0.000                               
##  1st Qu.:   0.000                               
##  Median :   0.000                               
##  Mean   :   5.754                               
##  3rd Qu.:   0.000                               
##  Max.   :5002.000                               
##  air temp, corrected with Anderson and Baumgartner (1998)[K]
##  Min.   :   0.0                                             
##  1st Qu.:   0.0                                             
##  Median :   0.0                                             
##  Mean   : 654.3                                             
##  3rd Qu.:   0.0                                             
##  Max.   :4009.0                                             
##  air temp, corrected with Nakamura and Mahrt (2005) [K]     site          
##  Min.   :   0.0                                         Length:138336     
##  1st Qu.:   0.0                                         Class :character  
##  Median :   0.0                                         Mode  :character  
##  Mean   : 155.4                                                           
##  3rd Qu.:   0.0                                                           
##  Max.   :6009.0
  1. Make a line plot of mean temp by year by site (using the air temp [K] variable). Is there anything suspicious in the plot? Adjust your filtering if needed.
# prep data, find means
forcing_yearly <- forcing_data_full %>%
  group_by(year, site) %>%
  filter(year %in% c(2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011)) %>%
  summarize(
    mean = mean(`air temp [K]`)
  )
## `summarise()` has grouped output by 'year'. You can override using the `.groups` argument.
# plot
ggplot(forcing_yearly, (aes(x=year, y=mean, color=site))) +geom_line()

On average, air temperature has increased over time from 2004 to 2011. The SBSP site’s air temperature is on average, always a few degrees cooler than the SASP site. 2003 was filtered out due to containing a small amount of data only in winter, leading to a very cold mean, which is not representative.

  1. Write a function that makes line plots of monthly average temperature at each site for a given year. Use a for loop to make these plots for 2005 to 2010. Are monthly average temperatures at the Senator Beck Study Plot ever warmer than the Snow Angel Study Plot? Hint: https://ggplot2.tidyverse.org/reference/print.ggplot.html
# write a function to plot
lineplotter <- function(df,year){
  temp_month <- df %>%
    group_by(year, month, site) %>%
    summarize(
      meantemp = mean(`air temp [K]`, na.rm = T)) %>%
    filter (yr == year)
    
  linegraph <-
    ggplot(temp_month, aes(x= month, y= meantemp, color=site)) + geom_line() + labs(x = 'Month', y= 'Average Air Temperature [K]', title = yr)
  
  print(linegraph)
}

# create a list of years to plot
years <- c(2005, 2006, 2007, 2008, 2009, 2010)

#create a for loop
for (yr in years){
  lineplotter(forcing_data_full, year)
}
## `summarise()` has grouped output by 'year', 'month'. You can override using the `.groups` argument.
## `summarise()` has grouped output by 'year', 'month'. You can override using the `.groups` argument.

## `summarise()` has grouped output by 'year', 'month'. You can override using the `.groups` argument.

## `summarise()` has grouped output by 'year', 'month'. You can override using the `.groups` argument.

## `summarise()` has grouped output by 'year', 'month'. You can override using the `.groups` argument.

## `summarise()` has grouped output by 'year', 'month'. You can override using the `.groups` argument.

Each year’s average air temperature varies in a similar pattern, starting low in January/February, increasing until peaking around August and then decreasing again through the end of the year. This makes sense, as temperatures are lower in winter (beginning and end of the year) and higher in summer (especially July/August). The SBSP site’s average air temperature is consistently lower than the SASP site.

Bonus: Make a plot of average daily precipitation by day of year (averaged across all available years). Color each site.

# prep data and find means
dailyprecip <- forcing_data_full %>%
  group_by(month, day) %>%
  summarize(
    meandailyprecip = mean(`precip [kg m-2 s-1]`)) 
## `summarise()` has grouped output by 'month'. You can override using the `.groups` argument.
# here I tried to create a month/day combo column to be able to plot by
dailyprecip_dates <- dailyprecip %>%
  unite("DM", day:month, remove = FALSE)

# plot
ggplot(dailyprecip_dates, aes(x=DM, y=meandailyprecip)) + geom_point()

Bonus #2: Use a function and for loop to create yearly plots of precipitation by day of year. Color each site.

# edit data to have a date column
forcing_data_full$Date <- as.Date(with(forcing_data_full,paste(year,month,day,sep="-")),"%Y-%m-%d")
# write a function to plot
precipplotter <- function(df,year){
  precip_days <- df %>%
    group_by(month, day, year, site)%>%
    filter (yr == year) 

  precipgraph <-
    ggplot(precip_days, aes(x= Date, y= `precip [kg m-2 s-1]`)) + geom_line() + labs(x = 'Date', y= 'Precip', title = yr)
  
  print(precipgraph)
}

# create a list of years to plot
years <- c(2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011)

# create a for in loop
for (yr in years){
  precipplotter(forcing_data_full, year)
}

5 Assignment 4: LAGOS Spatial Analyses 1

title: “LAGOS Spatial Analysis” author: “Matthew Ross, completed by Samantha Clark” date: “2/23/2022” output: html_document editor_options: chunk_output_type: console

5.1 LAGOS Analysis

5.1.1 Loading in data

#install.packages(c("RApiSerialize", "LAGOSNE", 'USAboundaries'))

#LAGOSNE::lagosne_get(dest_folder = LAGOSNE:::lagos_path())

5.1.1.1 First download and then specifically grab the locus (or site lat longs)

# #Lagos download script
#LAGOSNE::lagosne_get(dest_folder = LAGOSNE:::lagos_path())


#Load in lagos
lagos <- lagosne_load()
## Warning in (function (version = NULL, fpath = NA) : LAGOSNE version unspecified,
## loading version: 1.087.3
#Grab the lake centroid info
lake_centers <- lagos$locus

5.1.1.2 Convert to spatial data

#Look at the column names
#names(lake_centers)

#Look at the structure
#str(lake_centers)

#View the full dataset
#View(lake_centers %>% slice(1:100))

spatial_lakes <- st_as_sf(lake_centers,coords=c('nhd_long','nhd_lat'),
                          crs=4326) %>%
  st_transform(2163)

#Subset for plotting
subset_spatial <- spatial_lakes %>%
  slice(1:100) 

subset_baser <- spatial_lakes[1:100,]

#Dynamic mapviewer
mapview(subset_spatial)

5.1.1.3 Subset to only Minnesota

states <- us_states()

#Plot all the states to check if they loaded
#mapview(states)
minnesota <- states %>%
  filter(name == 'Minnesota') %>%
  st_transform(2163)

#Subset lakes based on spatial position
minnesota_lakes <- spatial_lakes[minnesota,]

#Plotting the first 1000 lakes
minnesota_lakes %>%
  arrange(-lake_area_ha) %>%
    slice(1:1000) %>%
  st_transform(4326) %>%
  mapview()
mapviewOptions(fgb = F)

5.2 In-Class work

5.2.1 1) Show a map outline of Iowa and Illinois (similar to Minnesota map upstream)

#subset to Iowa and Illinois
IAandIL <- states %>%
  filter(name %in% c('Iowa', 'Illinois')) %>%
  st_transform(2163) 

#create a map
mapview(IAandIL)

5.2.2 2) Subset LAGOS data to these sites, how many sites are in Illinois and Iowa combined? How does this compare to Minnesota?

#Subset lakes based on spatial position
IAandIL_lakes <- spatial_lakes[IAandIL,]

nrow(IAandIL_lakes)
## [1] 16466
nrow(minnesota_lakes)
## [1] 29038

Illinois and Iowa have 16,466 sites. Minnesota has 29,038 sites. This is nearly double the number of sites, compared to Illinois and Iowa combined.

5.2.3 3) What is the distribution of lake size in Iowa vs. Minnesota?

  • Here I want to see a histogram plot with lake size on x-axis and frequency on y axis (check out geom_histogram)
  • make histogram log
# subset to just Iowa data
iowa <- states %>%
  filter(name == 'Iowa') %>%
  st_transform(2163)

# create spatial lake data for Iowa
iowa_lakes <- spatial_lakes[iowa,]
iowa_lakes$state <- 'Iowa' 

# get a lakes set for Minnesota
minnesota_lakes$state <- 'Minnesota'

#Combine Iowa and Minnesota Data
IAandMN_lakes <- bind_rows(iowa_lakes, minnesota_lakes)

names(IAandMN_lakes)
##  [1] "lagoslakeid"       "nhdid"             "gnis_name"        
##  [4] "lake_area_ha"      "lake_perim_meters" "nhd_fcode"        
##  [7] "nhd_ftype"         "iws_zoneid"        "hu4_zoneid"       
## [10] "hu6_zoneid"        "hu8_zoneid"        "hu12_zoneid"      
## [13] "edu_zoneid"        "county_zoneid"     "state_zoneid"     
## [16] "elevation_m"       "state"             "geometry"
#Plot the lakes
ggplot(IAandMN_lakes) + aes(x=lake_area_ha, fill = state) +geom_histogram()+ scale_x_log10() + ylab('Frequency') +xlab('Lake Size in Hectares') + facet_wrap(vars(state))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

There are more lakes in Minnesota than in Iowa, but the distribution of lakes is similar. For both states there are a greater number of smaller lakes, and a smaller number of large lakes.

5.2.4 4) Make an interactive plot of lakes in Iowa and Illinois and color them by lake area in hectares

# Use mapview to create an interactive plot of lakes
mapview(IAandIL_lakes, zcol = "lake_area_ha", at = c(0, 5, 10, 100, 250, 500, 750, 1000, 5000, 10000))

5.2.5 5) What other data sources might we use to understand how reservoirs and natural lakes vary in size in these three states?

During the different seasons, and different conditions such as drought, lakes will vary in the amount of water they hold. It would be insightful to find a data source that included depth information, and variance in size and depth throughout the year. This would be very helpful in understanding how lakes differ between these states. It may also be interesting to see what different states’ definitions are for lakes, and to see if that may influence the data when looking at the number of lakes and their sizes in the different states.

6 Assignment 5: LAGOS Spatial Analyses 2

title: “Lake Water Quality Analysis” author: “Matthew Ross, completed by Samantha Clark” date: “2/23/2022” output: html_document

6.1 LAGOS Analysis

6.1.1 Loading in data

6.1.1.1 First download and then specifically grab the locus (or site lat longs)

#Lagos download script
#lagosne_get(dest_folder = LAGOSNE:::lagos_path(),overwrite=T)

#Load in lagos
lagos <- lagosne_load()


#Grab the lake centroid info
lake_centers <- lagos$locus

# Make an sf object 
spatial_lakes <- st_as_sf(lake_centers,coords=c('nhd_long','nhd_lat'),
                          crs=4326)

#Grab the water quality data
nutr <- lagos$epi_nutr

#Look at column names
#names(nutr)

6.1.1.2 Subset columns nutr to only keep key info that we want

clarity_only <- nutr %>%
  select(lagoslakeid,sampledate,chla,doc,secchi) %>%
  mutate(sampledate = as.character(sampledate) %>% ymd(.))

6.1.1.3 Keep sites with at least 200 observations

#Look at the number of rows of dataset
#nrow(clarity_only)

chla_secchi <- clarity_only %>%
  filter(!is.na(chla),
         !is.na(secchi))

# How many observatiosn did we lose?
# nrow(clarity_only) - nrow(chla_secchi)


# Keep only the lakes with at least 200 observations of secchi and chla
chla_secchi_200 <- chla_secchi %>%
  group_by(lagoslakeid) %>%
  mutate(count = n()) %>%
  filter(count > 200)

6.1.1.4 Join water quality data to spatial data

spatial_200 <- inner_join(spatial_lakes,chla_secchi_200 %>%
                            distinct(lagoslakeid,.keep_all=T),
                          by='lagoslakeid')

6.1.1.5 Mean Chl_a map

### Take the mean chl_a and secchi by lake

mean_values_200 <- chla_secchi_200 %>%
  # Take summary by lake id
  group_by(lagoslakeid) %>%
  # take mean chl_a per lake id
  summarize(mean_chl = mean(chla,na.rm=T),
            mean_secchi=mean(secchi,na.rm=T)) %>%
  #Get rid of NAs
  filter(!is.na(mean_chl),
         !is.na(mean_secchi)) %>%
  # Take the log base 10 of the mean_chl
  mutate(log10_mean_chl = log10(mean_chl))

#Join datasets
mean_spatial <- inner_join(spatial_lakes,mean_values_200,
                          by='lagoslakeid') 

#Make a map
mapview(mean_spatial,zcol='log10_mean_chl')

6.2 Class work

6.2.1 1) What is the correlation between Secchi Disk Depth and Chlorophyll a for sites with at least 200 observations?

  • Here, I just want a plot of chla vs secchi for all sites
#plot
ggplot(chla_secchi_200) + aes(x=chla, y=secchi) + geom_point() + geom_smooth()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

As the amount of chla increases, secchi decreases.

6.3 Why might this be the case?

As there is greater chla, there is more algae in the water. The more algae, the less clarity and therefore a lower secchi number.

6.3.1 2) What states have the most data?

6.3.1.1 2a) First you will need to make a lagos spatial dataset that has the total number of counts per site.

#create count of lakes data set
countoflakes <- nutr %>%
  group_by(lagoslakeid) %>%
  summarize(count = n())

#join with spatial data
twoa <- inner_join(spatial_lakes, countoflakes %>%
                            distinct(lagoslakeid,.keep_all=T),
                          by='lagoslakeid')

6.3.1.2 2b) Second, you will need to join this point dataset to the us_boundaries data.

#join spatial lake count data with states data
states <- us_states()
lakesbystate <- st_join(twoa, states)

6.3.1.3 2c) Then you will want to group by state and sum all the observations in that state and arrange that data from most to least toatl observations per state.

#Create clean data set showing observations per state
Final <- lakesbystate %>%
  group_by(state_name) %>%
  summarize(
    sum = sum(count)
  ) %>%
  arrange(desc(sum))

The states with the most data are Minnesota (358137), Wisconsin (145910), and Michigan (100683). The next largest are Maine, New York, Vermont, Rhode Island, Missouri, and New Hampshire with between 90,000 and 10,000 lakes.

###3 Is there a spatial pattern in Secchi disk depth for lakes with at least 200 observations?

#filter out info we dont need
clarity_only2 <- nutr %>%
  select(lagoslakeid,sampledate,doc,secchi) %>%
  mutate(sampledate = as.character(sampledate) %>% ymd(.))

#filter NAs
secchi <- clarity_only2 %>%
  filter(!is.na(secchi))

#filter out anything without 200 obs
secchi_200 <- secchi %>%
  group_by(lagoslakeid) %>%
  mutate(count = n()) %>%
  filter(count > 200)

### Take the mean secchi by lake
mean_secchi_200 <- secchi_200 %>%
  # Take summary by lake id
  group_by(lagoslakeid) %>%
  # take mean of secchi per lake id
  summarize(mean_secchi=mean(secchi,na.rm=T)) %>%
  #Get rid of NAs
  filter(!is.na(mean_secchi))

#Join datasets
mean_spatial_secchi <- inner_join(spatial_lakes,mean_secchi_200,
                          by='lagoslakeid') 

#Make a map
mapview(mean_spatial_secchi)

The majority of lakes at least 200 observations are in the upper north east of the US. Specifically, in Minnesota, Michigan, and Maine.

7 Assignment 6: Weather Corn Regressions

title: “Weather and Corn Yield Regressions” author: “Nathan Mueller” date: “2/25/2022” output: html_document

7.1 Weather Data Analysis

7.1.1 Load the PRISM daily maximum temperatures

# daily max temperature
# dimensions: counties x days x years
prism <- readMat("~/Desktop/R/ESS580A7/prismiowa.mat")

# look at county #1
t_1981_c1 <- prism$tmaxdaily.iowa[1,,1]
t_1981_c1[366]
## [1] NaN
plot(1:366, t_1981_c1, type = "l")

ggplot() +
  geom_line(mapping = aes(x=1:366, y = t_1981_c1)) +
  theme_bw() +
  xlab("day of year") +
  ylab("daily maximum temperature (°C)") +
  ggtitle("Daily Maximum Temperature, Iowa County #1")
## Warning: Removed 1 row(s) containing missing values (geom_path).

# assign dimension names to tmax matrix
dimnames(prism$tmaxdaily.iowa) <- list(prism$COUNTYFP, 1:366, prism$years)

# converted 3d matrix into a data frame
tmaxdf <- as.data.frame.table(prism$tmaxdaily.iowa)

# relabel the columns
colnames(tmaxdf) <- c("countyfp","doy","year","tmax")
tmaxdf <- tibble(tmaxdf)

7.3 Assignment

7.3.1 Question 1a: Extract Winneshiek County corn yields, fit a linear time trend, make a plot. Is there a significant time trend?

# Filter for Winneshiek County
winnecorn <- cornyields %>%
  filter(county_name == 'WINNESHIEK')

# Fit a linear time trend
lm_winnecorn <- lm(yield ~ year, winnecorn)
summary(lm_winnecorn)
## 
## Call:
## lm(formula = yield ~ year, data = winnecorn)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -51.163  -1.841   2.363   9.437  24.376 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -4763.290    448.286  -10.63 4.46e-13 ***
## year            2.457      0.224   10.96 1.77e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 16.97 on 39 degrees of freedom
## Multiple R-squared:  0.7551, Adjusted R-squared:  0.7488 
## F-statistic: 120.2 on 1 and 39 DF,  p-value: 1.767e-13
# Make a Plot
ggplot(winnecorn) + aes(x=year, y= yield) + geom_point() + geom_smooth(method = lm) + xlab('Year') + ylab('Yield') + ggtitle('Winneshiek County Corn Yield Over Time')
## `geom_smooth()` using formula 'y ~ x'

Yes, over time yield is increasing.

7.3.2 Question 1b: Fit a quadratic time trend (i.e., year + year^2) and make a plot. Is there evidence for slowing yield growth?

# Create Year Squared Column
winnecorn$yearsq <- winnecorn$year^2

#LM Fit
lm_winnecornquad <- lm(yield ~ year + yearsq, winnecorn)
summary(lm_winnecornquad)
## 
## Call:
## lm(formula = yield ~ year + yearsq, data = winnecorn)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -51.384  -3.115   1.388   9.743  25.324 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)
## (Intercept)  2.583e+04  8.580e+04   0.301    0.765
## year        -2.812e+01  8.576e+01  -0.328    0.745
## yearsq       7.641e-03  2.143e-02   0.357    0.723
## 
## Residual standard error: 17.17 on 38 degrees of freedom
## Multiple R-squared:  0.7559, Adjusted R-squared:  0.7431 
## F-statistic: 58.84 on 2 and 38 DF,  p-value: 2.311e-12
winnecorn$fitted <- lm_winnecornquad$fitted.values

# Make a Plot
ggplot(winnecorn) +
  geom_point(mapping = aes(x = year, y = yield)) +
  geom_line(mapping = aes(x = year, y = fitted)) + xlab ('Year') + ylab('Yield')

There is not evidence to suggest slowing yield growth.

7.3.3 Question 2 – Time Series: Let’s analyze the relationship between temperature and yields for the Winneshiek County time series. Use data on yield and summer avg Tmax. Is adding year or Tmax^2 to your model helpful? Make a plot and interpret the results.

# Combine Yield and Temp Data Sets
winnecombo <- inner_join(winnecorn, winnesummer, by = 'year')

# Create TMaxSq Column in Combo Data Set
winnecombo$Tmaxsq <- winnecombo$meantmax^2

# Lm for yield and temp
lm_TempYield <- lm(yield ~ year + meantmax + Tmaxsq, winnecombo)
summary(lm_TempYield)
## 
## Call:
## lm(formula = yield ~ year + meantmax + Tmaxsq, data = winnecombo)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -44.683  -5.427   1.214   8.943  25.127 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -7280.2956   755.2808  -9.639 2.96e-11 ***
## year            2.3286     0.2166  10.752 1.77e-12 ***
## meantmax      208.9004    52.9773   3.943 0.000381 ***
## Tmaxsq         -3.9259     0.9799  -4.006 0.000318 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14.27 on 34 degrees of freedom
## Multiple R-squared:  0.8277, Adjusted R-squared:  0.8125 
## F-statistic: 54.43 on 3 and 34 DF,  p-value: 4.538e-13
winnecombo$fitted <- lm_TempYield$fitted.values

# Make a plot
ggplot(winnecombo) + geom_point(mapping = aes(x= year, y= yield)) + geom_line(mapping = aes(x = year, y= fitted)) + xlab ('Year') +ylab('Yield')

Over time yield is increasing, as is temperature. This model suggests that there is a positive relationship between temperature and yield.

7.3.4 Question 3 – Cross-Section: Analyze the relationship between temperature and yield across all counties in 2018. Is there a relationship? Interpret the results.

#Filter corn for only 2018
corn2018 <- cornyields %>%
  filter(year == 2018) %>%
  rename(countyfp = "county_ansi")

#Filter tmaxdf for 2018, get means
tmax2018 <- tmaxdf %>%
  filter(year == 2018) %>%
  filter(!is.na(tmax)) %>%
  filter(doy >= 152 & doy <=243) %>%
  group_by(countyfp, year) %>%
  summarize(
    mean = mean(tmax))
## `summarise()` has grouped output by 'countyfp'. You can override using the `.groups` argument.
#factor to numeric
tmax2018$countyfp <- as.numeric(as.character(tmax2018$countyfp))

#combine the data sets
countyyields <- left_join(corn2018, tmax2018, by='countyfp')

#plot
ggplot(countyyields) + aes(x= mean, y= yield) + geom_point() +geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

There is a positive relationship between yield and average temperature to a point (about 28.2 degrees) and then there is a negative relationship. So at first, as temperature increases so does yield, but then when temperature continues to increase yield begins to decrease. This could be because of the ideal temperature range of corn. Once the temperature increases beyond the ideal temperature, yields suffer. But, this could also be caused due to other factors not considered within the model such as types of management practices at different temperatures.

7.3.5 Question 4 – Panel: One way to leverage multiple time series is to group all data into what is called a “panel” regression. Convert the county ID code (“countyfp” or “county_ansi”) into factor using as.factor, then include this variable in a regression using all counties’ yield and summer temperature data. How does the significance of your temperature coefficients (Tmax, Tmax^2) change? Make a plot comparing actual and fitted yields and interpret the results of your model.

# prep data

summertmaxdf <- tmaxdf %>%
  filter(doy >= 152 & doy <= 243) %>%
  group_by(countyfp, year) %>%
  summarize(meantmax = mean(tmax)) %>%
  rename(county_ansi = countyfp)
## `summarise()` has grouped output by 'countyfp'. You can override using the `.groups` argument.
cornyields$county_ansi <- as.factor(cornyields$county_ansi)

summeryieldtemps <- left_join(summertmaxdf, cornyields, by=c('county_ansi', 'year')) %>%
  filter(!is.na(yield))

# add tmaxsq column

summeryieldtemps$tmaxsq <- summeryieldtemps$meantmax^2

# lm for county, yield, and temp

lm_CountyTempYield <- lm(yield ~ meantmax + tmaxsq + year + county_ansi, summeryieldtemps)
summary(lm_CountyTempYield)
## 
## Call:
## lm(formula = yield ~ meantmax + tmaxsq + year + county_ansi, 
##     data = summeryieldtemps)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -81.645  -9.720   1.924  13.232  40.409 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    -5.815e+03  9.802e+01 -59.319  < 2e-16 ***
## meantmax        1.182e+02  6.108e+00  19.352  < 2e-16 ***
## tmaxsq         -2.225e+00  1.085e-01 -20.503  < 2e-16 ***
## year            2.203e+00  2.836e-02  77.664  < 2e-16 ***
## county_ansi147 -4.601e+00  4.322e+00  -1.065 0.287100    
## county_ansi13  -4.573e+00  4.322e+00  -1.058 0.290041    
## county_ansi99   2.537e+00  4.322e+00   0.587 0.557264    
## county_ansi195 -5.740e+00  4.339e+00  -1.323 0.186000    
## county_ansi67  -7.072e+00  4.324e+00  -1.636 0.102029    
## county_ansi107 -9.660e+00  4.325e+00  -2.233 0.025596 *  
## county_ansi15   3.120e+00  4.322e+00   0.722 0.470379    
## county_ansi57  -1.366e+00  4.327e+00  -0.316 0.752212    
## county_ansi127  2.318e+00  4.321e+00   0.536 0.591719    
## county_ansi123 -3.711e+00  4.325e+00  -0.858 0.390951    
## county_ansi81  -3.157e+00  4.327e+00  -0.729 0.465744    
## county_ansi129 -1.637e+00  4.396e+00  -0.372 0.709703    
## county_ansi1   -1.186e+01  4.325e+00  -2.743 0.006124 ** 
## county_ansi175 -2.341e+01  4.354e+00  -5.377 8.03e-08 ***
## county_ansi19  -4.212e+00  4.324e+00  -0.974 0.330092    
## county_ansi37  -9.693e+00  4.328e+00  -2.240 0.025182 *  
## county_ansi109  3.589e-01  4.324e+00   0.083 0.933863    
## county_ansi165 -4.189e+00  4.322e+00  -0.969 0.332408    
## county_ansi51  -2.644e+01  4.360e+00  -6.063 1.47e-09 ***
## county_ansi69  -5.518e-01  4.322e+00  -0.128 0.898414    
## county_ansi137 -5.977e+00  4.330e+00  -1.381 0.167514    
## county_ansi9   -6.795e+00  4.322e+00  -1.572 0.115958    
## county_ansi87  -6.583e+00  4.328e+00  -1.521 0.128366    
## county_ansi5   -9.147e+00  4.329e+00  -2.113 0.034697 *  
## county_ansi189 -3.627e+00  4.330e+00  -0.838 0.402306    
## county_ansi53  -2.790e+01  4.356e+00  -6.404 1.70e-10 ***
## county_ansi121 -1.445e+01  4.324e+00  -3.342 0.000841 ***
## county_ansi191 -7.295e+00  4.335e+00  -1.683 0.092484 .  
## county_ansi71   1.715e+00  4.344e+00   0.395 0.692933    
## county_ansi135 -2.771e+01  4.354e+00  -6.365 2.20e-10 ***
## county_ansi23  -2.774e+00  4.321e+00  -0.642 0.520945    
## county_ansi179 -1.508e+01  4.358e+00  -3.461 0.000544 ***
## county_ansi11  -4.677e+00  4.321e+00  -1.082 0.279184    
## county_ansi193 -9.064e+00  4.323e+00  -2.097 0.036096 *  
## county_ansi117 -3.354e+01  4.385e+00  -7.650 2.56e-14 ***
## county_ansi139 -3.580e+00  4.324e+00  -0.828 0.407827    
## county_ansi141  2.369e+00  4.321e+00   0.548 0.583576    
## county_ansi3   -1.639e+01  4.325e+00  -3.790 0.000153 ***
## county_ansi29  -5.718e+00  4.325e+00  -1.322 0.186209    
## county_ansi7   -3.014e+01  4.355e+00  -6.922 5.22e-12 ***
## county_ansi95  -4.648e+00  4.322e+00  -1.075 0.282230    
## county_ansi77  -8.484e+00  4.324e+00  -1.962 0.049814 *  
## county_ansi55  -2.440e+00  4.326e+00  -0.564 0.572813    
## county_ansi91  -1.981e+00  4.324e+00  -0.458 0.646825    
## county_ansi79   1.290e+00  4.321e+00   0.299 0.765255    
## county_ansi59  -8.957e+00  4.325e+00  -2.071 0.038446 *  
## county_ansi27  -2.196e+00  4.321e+00  -0.508 0.611316    
## county_ansi187  2.350e+00  4.321e+00   0.544 0.586618    
## county_ansi41  -5.251e+00  4.322e+00  -1.215 0.224409    
## county_ansi45  -1.312e+00  4.321e+00  -0.304 0.761484    
## county_ansi153  2.166e+00  4.325e+00   0.501 0.616434    
## county_ansi65  -4.543e+00  4.329e+00  -1.050 0.293954    
## county_ansi131 -4.578e+00  4.336e+00  -1.056 0.291223    
## county_ansi143 -3.119e+00  4.326e+00  -0.721 0.470936    
## county_ansi197 -3.072e-01  4.322e+00  -0.071 0.943331    
## county_ansi133 -1.106e+01  4.328e+00  -2.556 0.010618 *  
## county_ansi83   2.088e+00  4.321e+00   0.483 0.629024    
## county_ansi35   2.034e+00  4.321e+00   0.471 0.637848    
## county_ansi101 -1.133e+01  4.335e+00  -2.613 0.009012 ** 
## county_ansi21  -3.222e+00  4.321e+00  -0.746 0.455896    
## county_ansi105 -4.132e+00  4.322e+00  -0.956 0.339059    
## county_ansi63  -4.630e+00  4.328e+00  -1.070 0.284740    
## county_ansi125 -9.943e+00  4.325e+00  -2.299 0.021566 *  
## county_ansi157 -1.315e+00  4.323e+00  -0.304 0.761057    
## county_ansi155 -5.942e-01  4.356e+00  -0.136 0.891498    
## county_ansi169 -6.458e-01  4.321e+00  -0.149 0.881201    
## county_ansi145 -1.223e+01  4.330e+00  -2.824 0.004764 ** 
## county_ansi167  3.716e+00  4.321e+00   0.860 0.389950    
## county_ansi159 -3.257e+01  4.325e+00  -7.529 6.38e-14 ***
## county_ansi89  -1.092e+01  4.346e+00  -2.513 0.012022 *  
## county_ansi115 -4.533e+00  4.329e+00  -1.047 0.295100    
## county_ansi173 -2.591e+01  4.357e+00  -5.947 2.99e-09 ***
## county_ansi49  -1.050e+00  4.324e+00  -0.243 0.808084    
## county_ansi151 -3.603e-01  4.321e+00  -0.083 0.933548    
## county_ansi39  -3.591e+01  4.354e+00  -8.246 2.26e-16 ***
## county_ansi17  -5.353e-01  4.323e+00  -0.124 0.901456    
## county_ansi97  -1.325e+01  4.322e+00  -3.065 0.002190 ** 
## county_ansi85  -4.971e+00  4.327e+00  -1.149 0.250644    
## county_ansi149 -4.510e+00  4.322e+00  -1.044 0.296759    
## county_ansi185 -3.376e+01  4.353e+00  -7.754 1.15e-14 ***
## county_ansi47  -5.335e+00  4.321e+00  -1.235 0.217050    
## county_ansi111 -1.008e+01  4.335e+00  -2.326 0.020060 *  
## county_ansi177 -1.714e+01  4.343e+00  -3.947 8.08e-05 ***
## county_ansi75  -3.522e-01  4.321e+00  -0.081 0.935055    
## county_ansi181 -1.402e+01  4.325e+00  -3.242 0.001198 ** 
## county_ansi31   3.927e+00  4.321e+00   0.909 0.363477    
## county_ansi25  -1.476e+00  4.321e+00  -0.342 0.732672    
## county_ansi61  -2.068e+00  4.328e+00  -0.478 0.632814    
## county_ansi113 -5.448e+00  4.321e+00  -1.261 0.207502    
## county_ansi183 -1.446e+00  4.328e+00  -0.334 0.738278    
## county_ansi33  -7.281e+00  4.326e+00  -1.683 0.092484 .  
## county_ansi43  -2.998e+00  4.326e+00  -0.693 0.488271    
## county_ansi73   2.754e+00  4.323e+00   0.637 0.524176    
## county_ansi161 -2.473e+00  4.321e+00  -0.572 0.567164    
## county_ansi103 -7.483e+00  4.322e+00  -1.731 0.083476 .  
## county_ansi119 -2.534e+00  4.321e+00  -0.586 0.557604    
## county_ansi163  4.415e+00  4.321e+00   1.022 0.306959    
## county_ansi171 -2.122e+00  4.321e+00  -0.491 0.623334    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.83 on 3646 degrees of freedom
## Multiple R-squared:  0.7207, Adjusted R-squared:  0.7129 
## F-statistic: 93.13 on 101 and 3646 DF,  p-value: < 2.2e-16
summeryieldtemps$fitted <- lm_CountyTempYield$fitted.values

# Make a plot
ggplot(summeryieldtemps, aes(x= yield, y= fitted)) + geom_point() + geom_smooth() + xlab ('Actual Yield') +ylab('Fitted Yield')
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

This fitted versus actual yields plot shows the relationship between the fitted yields (the predicted yields produced by the model) and the actual yields. This can show how capable the model is at predicting the yield of corn. The trend suggests that the model is rather capable, as it closely fits to a 1 to 1 line. Between 50 and 100 the values vary from the line slightly. This also tells us the variables input into our model to have an effect on the corn’s yields, so the year, temperatures, and counties have an impact on the yield of corn.

7.3.6 Question 5 – Soybeans: Download NASS data on soybean yields and explore either a time series relationship for a given county, the cross-sectional relationship for a given year, or a panel across all counties and years.

# Create dataset for winnesheik county for soybeans
winnesoy <- soybeansyields %>%
  filter(county_name == 'WINNESHIEK')

# Combine Yield and Temp Data Sets
winnesoycombo <- inner_join(winnesoy, winnesummer, by = 'year')

# Create TMaxSq Column in Combo Data Set
winnesoycombo$Tmaxsq <- winnesoycombo$meantmax^2

# Lm for yield and temp
lm_SoyTempYield <- lm(yield ~ year + meantmax + Tmaxsq, winnesoycombo)
summary(lm_SoyTempYield)
## 
## Call:
## lm(formula = yield ~ year + meantmax + Tmaxsq, data = winnesoycombo)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -15.3872  -2.8753   0.7528   3.3727   8.2051 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.722e+03  2.826e+02  -6.094 6.51e-07 ***
## year         5.271e-01  8.104e-02   6.504 1.92e-07 ***
## meantmax     5.296e+01  1.982e+01   2.672   0.0115 *  
## Tmaxsq      -9.826e-01  3.667e-01  -2.680   0.0113 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.34 on 34 degrees of freedom
## Multiple R-squared:  0.6386, Adjusted R-squared:  0.6067 
## F-statistic: 20.02 on 3 and 34 DF,  p-value: 1.182e-07
winnesoycombo$fitted <- lm_SoyTempYield$fitted.values

# Make a plot
ggplot(winnesoycombo) + geom_point(mapping = aes(x= year, y= yield)) + geom_line(mapping = aes(x = year, y= fitted)) + xlab ('Year') +ylab('Yield')

Over time the soybean yield is increasing, as is temperature. This model suggests that there is a positive relationship between the temperature and the yield. This was the same relationship seen with the corn model for Winnesheik county.

7.3.7 Bonus: Find a package to make a county map of Iowa displaying some sort of information about yields or weather. Interpret your map.

#Create data set of corn yields in 2021 by county
states <- us_states()

counties <- us_counties(states = 'Iowa')
countiesfixed <- counties[!duplicated(as.list(counties))]

countycornyields2021 <- cornyields %>%
  rename(countyfp = 'county_ansi')  %>%
  filter(year == '2021') 

spatialcornyields2021 <- left_join(countiesfixed, countycornyields2021, by = 'countyfp')

iowa <- states %>%
  filter(name == 'Iowa') %>%
  st_transform(2163)

#create map of data with mapview function
mapview(spatialcornyields2021, zcol = "yield")

This map shows that not every county in Iowa produced corn in 2021 (or didn’t report the corn yield). The county that produced the most corn was Sac County. There is no clear spatial pattern relating to where corn is or is not produced. Overall corn is produced throughout the entire state of Iowa.